chore: Make graph visualization more scalable by ericfe-google · Pull Request #208 · googleapis/python-bigquery-magics

ericfe-google · 2026-02-02T21:47:36Z

We now only include query results directly in the html when the query results are less than 100 KB. For larger query results, we store only the reference to the destination table in the HTML, and have the python code re-read the query results from the destination table during the callback.

Also, added a hard limit of 5 MB in the query result size, beyond which, graph visualization is not supported altogether.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

gemini-code-assist · 2026-02-02T21:48:04Z

Summary of Changes

Hello @ericfe-google, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the scalability of the graph visualization feature by introducing intelligent data handling based on query result size. It prevents embedding excessively large datasets directly into the HTML, opting instead for a callback mechanism to fetch data from BigQuery for medium-sized results, and imposing a hard limit for extremely large ones. This change improves performance and resource usage, making the visualization more robust for diverse data volumes.

Highlights

Scalable Graph Visualization: Implemented a mechanism to handle large query results for graph visualization more efficiently, preventing direct embedding of excessively large datasets into HTML.
Size-Based Data Handling: Query results under 100KB are embedded directly in the HTML. For results between 100KB and 5MB, only a reference to the BigQuery destination table is passed, and the data is re-read via a callback when the visualization is requested.
Hard Limit for Visualization: Introduced a 5MB hard limit on query result size, beyond which graph visualization is not supported, displaying an error message instead.
Refactored Client Creation and User Agent Logic: Moved BigQuery client creation and user agent generation logic from bigquery.py to graph_server.py for improved modularity and reuse.
JSON Size Estimation: Added a new utility function _estimate_json_size to approximate the JSON size of a Pandas DataFrame without fully serializing it, enabling efficient size checks.

Changelog

bigquery_magics/bigquery.py
- Removed client_info, environment, version imports, and _get_user_agent function, moving them to graph_server.py.
- Refactored BigQuery client creation into graph_server.create_bq_client.
- Introduced MAX_GRAPH_VISUALIZATION_SIZE (5MB) and MAX_GRAPH_VISUALIZATION_QUERY_RESULT_SIZE (100KB) constants.
- Added _estimate_json_size function for approximating DataFrame JSON size.
- Modified _add_graph_widget to conditionally embed query results or pass destination table references based on estimated size.
- Updated _colab_query_callback to expect query_result within params.
- Updated _make_bq_query and _make_bqstorage_client to use the new _add_graph_widget signature and graph_server._get_user_agent.
bigquery_magics/graph_server.py
- Added imports for copy, IPython, client_info, bigquery, environment, config, version.
- Moved _get_user_agent and create_bq_client functions from bigquery.py.
- Modified handle_post_query to dynamically fetch query results from BigQuery using destination table references if not directly embedded in the initial request.
tests/unit/bigquery/test_bigquery.py
- Updated various test setups to include _project in bigquery_magics.context.
- Enhanced mocks for QueryJob to include configuration.destination attributes.
- Added assertions for args serialization in HTML content.
- Introduced new tests: test_bigquery_graph_size_exceeds_max, test_bigquery_graph_size_exceeds_query_result_max, and test_bigquery_graph_with_args_serialization.
- Adjusted test_colab_query_callback to match the updated _colab_query_callback parameter structure.
tests/unit/test_graph_server.py
- Updated test_post_query to reflect the new query_result key in the params data structure.

Activity

The author ericfe-google created this pull request.
The pull request introduces significant changes to how graph visualization data is handled, focusing on scalability.
Code refactoring involved moving client creation and user agent logic to a dedicated graph_server.py module.
New logic was implemented in bigquery.py to estimate JSON size and conditionally embed data or pass table references.
The graph_server.py was updated to handle callbacks for re-reading data from BigQuery for larger results.
Comprehensive unit tests were added and updated to cover the new size limits and data serialization mechanisms.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a more scalable approach for graph visualization by conditionally embedding query results in the HTML based on their size. For larger results, it fetches them on-demand from a temporary BigQuery table. The changes include refactoring client creation logic for better code organization and adding new size estimation functions. My review focuses on a bug in the size estimation logic and a maintainability improvement in the server-side handling of on-demand data fetching. The added tests are comprehensive and cover the new functionality well.

bigquery_magics/bigquery.py

bigquery_magics/graph_server.py

We now only include query results directly in the html when the query results are less than 100 KB. For larger query results, we store only the reference to the destination table in the HTML, and have the python code re-read the query results from the destination table during the callback. Also, added a hard limit of 5 MB in the query result size, beyond which, graph visualization is not supported altogether.

tswast · 2026-02-05T19:51:07Z

bigquery_magics/bigquery.py

-        client_options=bigquery_client_options,
-        location=args.location,
+    bq_client = core.create_bq_client(
+        args.project, args.bigquery_api_endpoint, args.location


Nit: since these are all string arguments (not that we've enabled type checking, anyway), passing by keyword could prevent accidentally passing the wrong one to the wrong value.

Suggested change

args.project, args.bigquery_api_endpoint, args.location

project=args.project,

bigquery_api_endpoint=args.bigquery_api_endpoint,

location=args.location,

tswast · 2026-02-05T19:52:46Z

bigquery_magics/core.py

+    return " ".join(identities)
+
+
+def create_bq_client(project: str, bigquery_api_endpoint: str, location: str):


Even more optional: we could force these to be keyword arguments. That's a practice the Vertex team follows, which is helpful because it means we could theoretically reorder the arguments without breaking changes. I'm of mixed opinion about forcing that on users, but I do think it's a useful practice for internal functions like this.

Suggested change

def create_bq_client(project: str, bigquery_api_endpoint: str, location: str):

def create_bq_client(*, project: str, bigquery_api_endpoint: str, location: str):

tswast · 2026-02-05T19:57:28Z

bigquery_magics/core.py

+import copy
+from google.api_core import client_info
+from google.cloud import bigquery
+import IPython  # type: ignore
+from bigquery_magics import environment
+import bigquery_magics.config
+import bigquery_magics.version


PEP-8:

Imports should be grouped in the following order:

Standard library imports.

Related third party imports.

Local application/library specific imports.

You should put a blank line between each group of imports.

https://peps.python.org/pep-0008/#imports

In this case:

Suggested change

import copy

from google.api_core import client_info

from google.cloud import bigquery

import IPython # type: ignore

from bigquery_magics import environment

import bigquery_magics.config

import bigquery_magics.version

import copy

from google.api_core import client_info

from google.cloud import bigquery

import IPython # type: ignore

from bigquery_magics import environment

import bigquery_magics.config

import bigquery_magics.version

tswast · 2026-02-05T20:02:12Z

bigquery_magics/bigquery.py

+MAX_GRAPH_VISUALIZATION_SIZE = 5000000
+MAX_GRAPH_VISUALIZATION_QUERY_RESULT_SIZE = 100000


Nit: I find it helpful to group the 0s in Python to better understand the scale at a glance.
Also, "size" is pretty ambiguous. Please rename to include the units. For example _BYTES.

Suggested change

MAX_GRAPH_VISUALIZATION_SIZE = 5000000

MAX_GRAPH_VISUALIZATION_QUERY_RESULT_SIZE = 100000

MAX_GRAPH_VISUALIZATION_BYTES = 5_000_000

MAX_GRAPH_VISUALIZATION_QUERY_RESULT_BYTES = 100_000

tswast · 2026-02-05T20:08:19Z

bigquery_magics/bigquery.py

+
+
+def _estimate_json_size(df: pandas.DataFrame) -> int:
+    """Approximates the length of df.to_json(orient='records')


I know it's not a perfect estimate, but pandas provides https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.memory_usage.html

Could we use that as the starting point, instead? How accurate do you need to be?

tswast · 2026-02-05T20:09:15Z

bigquery_magics/core.py

@@ -0,0 +1,73 @@
+# Copyright 2024 Google LLC


product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the googleapis/python-bigquery-magics API. labels Feb 2, 2026

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

bigquery_magics/bigquery.py Show resolved Hide resolved

bigquery_magics/graph_server.py Outdated Show resolved Hide resolved

ericfe-google force-pushed the query3 branch from 9e132e7 to d9e50f3 Compare February 2, 2026 22:30

ericfe-google marked this pull request as ready for review February 2, 2026 22:32

ericfe-google requested review from a team as code owners February 2, 2026 22:32

ericfe-google requested a review from TrevorBergeron February 2, 2026 22:32

blunderbuss-gcf bot assigned sycai Feb 2, 2026

ericfe-google force-pushed the query3 branch 7 times, most recently from f7d3ba0 to 88ab993 Compare February 3, 2026 00:08

ericfe-google force-pushed the query3 branch from 88ab993 to b5302e5 Compare February 3, 2026 00:12

ericfe-google assigned ericfe-google and tswast and unassigned sycai and ericfe-google Feb 3, 2026

ericfe-google added 2 commits February 3, 2026 14:28

Fix colab path

c81b1aa

Merge branch 'main' into query3

61e1763

tswast reviewed Feb 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Make graph visualization more scalable#208

chore: Make graph visualization more scalable#208
ericfe-google wants to merge 3 commits intomainfrom
query3

ericfe-google commented Feb 2, 2026

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

tswast Feb 5, 2026

Uh oh!

tswast Feb 5, 2026

Uh oh!

tswast Feb 5, 2026

Uh oh!

tswast Feb 5, 2026

Uh oh!

tswast Feb 5, 2026

Uh oh!

tswast Feb 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return " ".join(identities)


		def create_bq_client(project: str, bigquery_api_endpoint: str, location: str):

	def create_bq_client(project: str, bigquery_api_endpoint: str, location: str):
	def create_bq_client(*, project: str, bigquery_api_endpoint: str, location: str):

		MAX_GRAPH_VISUALIZATION_SIZE = 5000000
		MAX_GRAPH_VISUALIZATION_QUERY_RESULT_SIZE = 100000



		def _estimate_json_size(df: pandas.DataFrame) -> int:
		"""Approximates the length of df.to_json(orient='records')

Conversation

ericfe-google commented Feb 2, 2026

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

tswast Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

tswast Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

tswast Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

tswast Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

tswast Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

tswast Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants